Using N-best lists for Named Entity Recognition from Chinese Speech

نویسندگان

  • Lu-Feng Zhai
  • Pascale Fung
  • Richard M. Schwartz
  • Marine Carpuat
  • Dekai Wu
چکیده

We present the first known result for named entity recognition (NER) in realistic largevocabulary spoken Chinese. We establish this result by applying a maximum entropy model, currently the single best known approach for textual Chinese NER, to the recognition output of the BBN LVCSR system on Chinese Broadcast News utterances. Our results support the claim that transferring NER approaches from text to spoken language is a significantly more difficult task for Chinese than for English. We propose re-segmenting the ASR hypotheses as well as applying postclassification to improve the performance. Finally, we introduce a method of using n-best hypotheses that yields a small but nevertheless useful improvement NER accuracy. We use acoustic, phonetic, language model, NER and other scores as confidence measure. Experimental results show an average of 6.7% relative improvement in precision and 1.7% relative improvement in F-measure.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Two Step Chinese Named Entity Recognition Based on Conditional Random Fields Models

This paper mainly describes a Chinese named entity recognition (NER) system NER@ISCAS, which integrates text, partof-speech and a small-vocabularycharacter-lists feature and heristic postprocess rules for MSRA NER open track under the framework of Conditional Random Fields (CRFs) model.

متن کامل

Task Dependent Loss Functions in Speech Recognition: Application to Named Entity Extraction

We present a risk-based decoding strategy for the task of Named Entity identification from speech. This approach does not select the most likely utterance produced by an ASR system, which would be the maximum a-posteriori (MAP) strategy, but instead chooses an utterance from an N-best list in an attempt to minimize the Bayes Risk under loss functions derived specifically for the Named Entity ta...

متن کامل

The ICT statistical machine translation system for IWSLT 2010

This paper illustrates the ICT Statistical Machine Translation system used in the evaluation campaign of the International Workshop on Spoken Language Translation 2010. We participate in the DIALOG tasks for Chinese-to-English and English-to-Chinese translation respectively. For both tasks, our system has achieved significant improvement with several effective methods as follows: 1) refining th...

متن کامل

Improvement of Chemical Named Entity Recognition through Sentence-based Random Under-sampling and Classifier Combination

Chemical Named Entity Recognition (NER) is the basic step for consequent information extraction tasks such as named entity resolution, drug-drug interaction discovery, extraction of the names of the molecules and their properties. Improvement in the performance of such systems may affects the quality of the subsequent tasks. Chemical text from which data for named entity recognition is extracte...

متن کامل

بهبود شناسایی موجودیت‌های نامدار فارسی با استفاده از کسره اضافه

Named entity recognition is a process in which the people’s names, name of places (cities, countries, seas, etc.) and organizations (public and private companies, international institutions, etc.), date, currency and percentages in a text are identified. Named entity recognition plays an important role in many NLP tasks such as semantic role labeling, question answering, summarization, machine ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004